-
Notifications
You must be signed in to change notification settings - Fork 1k
benchmarks for openmp parallel for skip flag #7372
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
if atomic write alone would do then seems best fit |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #7372 +/- ##
==========================================
- Coverage 99.11% 98.90% -0.22%
==========================================
Files 85 86 +1
Lines 16443 16479 +36
==========================================
Hits 16298 16298
- Misses 145 181 +36 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Co-authored-by: Benjamin Schwendinger <[email protected]>
|
@ben-schwen it sems that new options are not much of use, maybe it is a compiler issue? I am on recent gcc, omp 201511. |
|
Interesting. With 20 threads I get this (which was my main motivation to include the reduction). I have gcc 11.4.0 and openmp 201511 halt variant V1 V2 V3 V4 V5 V6 V7 V8
<char> <char> <int> <int> <int> <int> <int> <int> <int> <int>
1: 1e8+1 nothing 12500000 12500000 12500000 12500000 12500000 12500000 12500000 12500000
2: 1e8+1 volatile 12500000 12500000 12500000 12500000 12500000 12500000 12500000 12500000
3: 1e8+1 volatile+shared 12500000 12500000 12500000 12500000 12500000 12500000 12500000 12500000
4: 1e8+1 atomic write 12500000 12500000 12500000 12500000 12500000 12500000 12500000 12500000
5: 1e8+1 atomic read write 12500000 12500000 12500000 12500000 12500000 12500000 12500000 12500000
6: 1e8+1 reduction 12500000 12500000 12500000 12500000 12500000 12500000 12500000 12500000
7: 1e8+1 cancellation 12500000 12500000 12500000 12500000 12500000 12500000 12500000 12500000
8: 1e7 nothing 10000000 12500000 12500000 12500000 12500000 12500000 12500000 12500000
9: 1e7 volatile 10000000 7523756 6139470 10163068 5543296 9819239 10011390 10223716
10: 1e7 volatile+shared 10000000 9721537 10062042 2737256 10091377 2760686 9725714 9918532
11: 1e7 atomic write 10000000 10143687 9958940 10128160 5549333 5473646 9742693 10022550
12: 1e7 atomic read write 10000000 9916360 9925820 2955716 10075786 10073474 9892110 2933117
13: 1e7 reduction 10000000 12500000 12500000 12500000 12500000 12500000 12500000 12500000
14: 1e7 cancellation 12500000 12500000 12500000 12500000 12500000 12500000 12500000 12500000
15: 10 nothing 10 0 0 12500000 0 12500000 0 0
16: 10 volatile 10 0 0 0 83056 0 0 74028
17: 10 volatile+shared 10 46021 19186 55439 0 28853 75632 0
18: 10 atomic write 10 90927 14187 0 95527 36864 0 142229
19: 10 atomic read write 10 0 69753 129379 0 0 90041 0
20: 10 reduction 10 12500000 12500000 12500000 12500000 12500000 12500000 12500000
21: 10 cancellation 12500000 12500000 12500000 12500000 12500000 12500000 12500000 12500000
halt variant V1 V2 V3 V4 V5 V6 V7 V8
<char> <char> <int> <int> <int> <int> <int> <int> <int> <int>
halt variant user.self sys.self elapsed user.child sys.child
<char> <char> <num> <num> <num> <num> <num>
1: 1e8+1 nothing 0.125 0.000 0.017 0.002 0.001
2: 1e8+1 volatile 0.055 0.000 0.012 0.002 0.000
3: 1e8+1 volatile+shared 0.080 0.000 0.014 0.003 0.000
4: 1e8+1 atomic write 0.047 0.000 0.010 0.002 0.000
5: 1e8+1 atomic read write 0.080 0.000 0.014 0.002 0.000
6: 1e8+1 reduction 0.033 0.000 0.009 0.002 0.001
7: 1e8+1 cancellation 0.964 0.000 0.126 0.002 0.000
8: 1e7 nothing 0.070 0.000 0.011 0.003 0.000
9: 1e7 volatile 0.053 0.000 0.012 0.002 0.000
10: 1e7 volatile+shared 0.057 0.000 0.011 0.002 0.000
11: 1e7 atomic write 0.015 0.004 0.007 0.003 0.000
12: 1e7 atomic read write 0.098 0.000 0.014 0.001 0.001
13: 1e7 reduction 0.062 0.000 0.011 0.002 0.000
14: 1e7 cancellation 1.012 0.000 0.123 0.002 0.001
15: 10 nothing 0.031 0.000 0.008 0.002 0.001
16: 10 volatile 0.069 0.000 0.010 0.002 0.000
17: 10 volatile+shared 0.054 0.000 0.010 0.002 0.000
18: 10 atomic write 0.001 0.000 0.003 0.001 0.002
19: 10 atomic read write 0.085 0.000 0.012 0.003 0.000
20: 10 reduction 0.031 0.000 0.008 0.001 0.002
21: 10 cancellation 1.061 0.000 0.152 0.002 0.000
halt variant user.self sys.self elapsed user.child sys.child
<char> <char> <num> <num> <num> <num> <num> |
|
@ben-schwen can you share also iterations made by each thread? |
|
@jangorecki I have added the iterations above. I have an |
Towards #7371
this code assumes you have 8+ threads
btw. I read volatile should not be used in favor of atomic